09. Serialization
Beyond accessing model attributes directly via their field names (e.g. model.foobar
), models can be converted, dumped, serialized, and exported in a number of ways.
Pydantic uses the terms "serialize" and "dump" interchangeably. Both refer to the process of converting a model to a dictionary or JSON-encoded string. Pydantic 将 serialize 和 dump 视为同义词,都指将 model 转换成字段或者 json 格式字符串的过程
model.model_dump(...)
This is the primary way of converting a model to a dictionary. Sub-models will be recursively converted to dictionaries.
The one exception to sub-models being converted to dictionaries is that
RootModel
and its subclasses will have theroot
field value dumped directly, without a wrapping dictionary. This is also done recursively.
from typing import Any, List, Optional
from pydantic import BaseModel, Field, Json
class BarModel(BaseModel):
whatever: int
class FooBarModel(BaseModel):
banana: Optional[float] = 1.1
foo: str = Field(serialization_alias='foo_alias')
bar: BarModel
m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})
# returns a dictionary:
print(m.model_dump())
#> {'banana': 3.14, 'foo': 'hello', 'bar': {'whatever': 123}}
print(m.model_dump(include={'foo', 'bar'}))
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(m.model_dump(exclude={'foo', 'bar'}))
#> {'banana': 3.14}
print(m.model_dump(by_alias=True))
#> {'banana': 3.14, 'foo_alias': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(foo='hello', bar={'whatever': 123}).model_dump(
exclude_unset=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(banana=1.1, foo='hello', bar={'whatever': 123}).model_dump(
exclude_defaults=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(foo='hello', bar={'whatever': 123}).model_dump(
exclude_defaults=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
print(
FooBarModel(banana=None, foo='hello', bar={'whatever': 123}).model_dump(
exclude_none=True
)
)
#> {'foo': 'hello', 'bar': {'whatever': 123}}
class Model(BaseModel):
x: List[Json[Any]]
print(Model(x=['{"a": 1}', '[1, 2]']).model_dump())
#> {'x': [{'a': 1}, [1, 2]]}
print(Model(x=['{"a": 1}', '[1, 2]']).model_dump(round_trip=True))
#> {'x': ['{"a":1}', '[1,2]']}
model.model_dump_json(...)
The .model_dump_json()
method serializes a model directly to a JSON-encoded string that is equivalent to the result produced by .model_dump()
.
from datetime import datetime
from pydantic import BaseModel
class BarModel(BaseModel):
whatever: int
class FooBarModel(BaseModel):
foo: datetime
bar: BarModel
m = FooBarModel(foo=datetime(2032, 6, 1, 12, 13, 14), bar={'whatever': 123})
print(m.model_dump_json())
#> {"foo":"2032-06-01T12:13:14","bar":{"whatever":123}}
print(m.model_dump_json(indent=2))
"""
{
"foo": "2032-06-01T12:13:14",
"bar": {
"whatever": 123
}
}
"""
常用参数
Name | Type | Description | Default |
---|---|---|---|
indent | int | Nonemodel_dump_json 特有 | JSON 输出的缩进。若是 None,则默认为紧凑模式 | None |
mode | Literal['json', 'python'] | strmodel_dump 特有 | to_python 应该运行的模式。如果是‘ JSON’,则输出将只包含 JSON 可序列化类型。如果是‘ Python’,则输出可能包含 JSON 不可序列化的 Python 对象。 | python |
include | IncEx | Field(s) to include in the JSON output. | None |
exclude | IncEx | Field(s) to exclude from the JSON output. | None |
context | dict[str, Any] | None| | 传给 serializer 的上下文 | None |
by_alias | bool | Whether to serialize using field aliases. | False |
exclude_unset | bool | 是否过滤掉那些没有被显式赋值的字段 | False |
exclude_defaults | bool | 是否过滤掉那些值等于其默认值的字段 | False |
exclude_none | bool | 是否过滤掉那些值等于 None 的字段 | False |
round_trip | bool | 如果设置为 True,转储的值应该是非幂等类型(如 Json[T] )的有效输入。If True, dumped values should be valid as input for non-idempotent types such as Json[T] . | False |
warnings | bool | Literal['none', 'warn', 'error'] | 如何处理序列化时的报错。False/"none" ignores them, True/"warn" logs errors, "error" raises a PydanticSerializationError . | True |
serialize_as_any | bool | Whether to serialize fields with duck-typing serialization behavior. | False |
dict(model)
与迭代
Pydantic models 还能够用 dict(models)
方式转成 dict,不过这不是一个递归的行为,so sub-models will not be converted to dictionaries.
可以使用 for field_name, field_value in model:
的方式去迭代 model
from pydantic import BaseModel
class BarModel(BaseModel):
whatever: int
class FooBarModel(BaseModel):
banana: float
foo: str
bar: BarModel
m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})
print(dict(m))
#> {'banana': 3.14, 'foo': 'hello', 'bar': BarModel(whatever=123)}
for name, value in m:
print(f'{name}: {value}')
#> banana: 3.14
#> foo: hello
#> bar: whatever=123
Note also that RootModel
does get converted to a dictionary with the key 'root'
.
自定义序列化行为
Pydantic provides several functional serializers to customise how a model is serialized to a dictionary or JSON.
@field_serializer
@model_serializer
PlainSerializer
WrapSerializer
使用@field_serializer
装饰器来改变某个字段的序列化行为,使用@model_serializer
装饰器来改变整个 model 的序列化行为
from datetime import datetime, timedelta, timezone
from typing import Any, Dict
from pydantic import BaseModel, ConfigDict, field_serializer, model_serializer
class WithCustomEncoders(BaseModel):
model_config = ConfigDict(ser_json_timedelta='iso8601')
dt: datetime
diff: timedelta
@field_serializer('dt')
def serialize_dt(self, dt: datetime, _info):
return dt.timestamp()
m = WithCustomEncoders(
dt=datetime(2032, 6, 1, tzinfo=timezone.utc), diff=timedelta(hours=100)
)
print(m.model_dump_json())
#> {"dt":1969660800.0,"diff":"P4DT4H"}
class Model(BaseModel):
x: str
@model_serializer
def ser_model(self) -> Dict[str, Any]:
return {'x': f'serialized {self.x}'}
print(Model(x='test value').model_dump_json())
#> {"x":"serialized test value"}
A single serializer can also be called on all fields by passing the special value '*' to the
@field_serializer
decorator.
In addition, PlainSerializer
and WrapSerializer
enable you to use a function to modify the output of serialization.
Both serializers accept optional arguments including:
return_type
specifies the return type for the function. If omitted it will be inferred from the type annotation.when_used
指定此序列化器何时会被使用. 可以是 'always', 'unless-none', 'json' 或 'json-unless-none'. Defaults to 'always'.
PlainSerializer
使用一个简单的函数去改变字段序列化的输出
from typing_extensions import Annotated
from pydantic import BaseModel
from pydantic.functional_serializers import PlainSerializer
FancyInt = Annotated[
int, PlainSerializer(lambda x: f'{x:,}', return_type=str, when_used='json')
]
class MyModel(BaseModel):
x: FancyInt
print(MyModel(x=1234).model_dump())
#> {'x': 1234}
print(MyModel(x=1234).model_dump(mode='json'))
#> {'x': '1,234'}
WrapSerializer
receives the raw inputs along with a handler function that applies the standard serialization logic, and can modify the resulting value before returning it as the final output of serialization.
from typing import Any
from typing_extensions import Annotated
from pydantic import BaseModel, SerializerFunctionWrapHandler
from pydantic.functional_serializers import WrapSerializer
def ser_wrap(v: Any, nxt: SerializerFunctionWrapHandler) -> str:
return f'{nxt(v + 1):,}'
FancyInt = Annotated[int, WrapSerializer(ser_wrap, when_used='json')]
class MyModel(BaseModel):
x: FancyInt
print(MyModel(x=1234).model_dump())
#> {'x': 1234}
print(MyModel(x=1234).model_dump(mode='json'))
#> {'x': '1,235'}
篡改 model_dump 的返回值类型
@model_serializer
能够篡改 .model_dump()
的返回值类型(通常是 dict[str, Any]
)
from pydantic import BaseModel, model_serializer
class Model(BaseModel):
x: str
@model_serializer
def ser_model(self) -> str:
return self.x
print(Model(x='not a dict').model_dump())
#> not a dict
If you want to do this and still get proper type-checking for this method, you can override .model_dump()
in an if TYPE_CHECKING:
block:
from typing import TYPE_CHECKING, Any
from typing_extensions import Literal
from pydantic import BaseModel, model_serializer
class Model(BaseModel):
x: str
@model_serializer
def ser_model(self) -> str:
return self.x
if TYPE_CHECKING:
# Ensure type checkers see the correct return type
def model_dump(
self,
*,
mode: Literal['json', 'python'] | str = 'python',
include: Any = None,
exclude: Any = None,
by_alias: bool = False,
exclude_unset: bool = False,
exclude_defaults: bool = False,
exclude_none: bool = False,
round_trip: bool = False,
warnings: bool = True,
) -> str:
...
This trick is actually used in RootModel
for precisely this purpose.
子类的序列化
标准类型的子类
标准类型的子类会像它们的基类一样被 dump
from datetime import date, timedelta
from typing import Any, Type
from pydantic_core import core_schema
from pydantic import BaseModel, GetCoreSchemaHandler
class DayThisYear(date):
"""
Contrived example of a special type of date that
takes an int and interprets it as a day in the current year
"""
@classmethod
def __get_pydantic_core_schema__(
cls, source: Type[Any], handler: GetCoreSchemaHandler
) -> core_schema.CoreSchema:
return core_schema.no_info_after_validator_function(
cls.validate,
core_schema.int_schema(),
serialization=core_schema.format_ser_schema('%Y-%m-%d'),
)
@classmethod
def validate(cls, v: int):
return date(2023, 1, 1) + timedelta(days=v)
class FooModel(BaseModel):
date: DayThisYear
m = FooModel(date=300)
print(m.model_dump_json())
#> {"date":"2023-10-28"}
BaseModel
, dataclasses
, TypedDict
的子类
When using fields whose annotations are themselves struct-like types (e.g., BaseModel
subclasses, dataclasses, etc.), the default behavior is to serialize the attribute value as though it was an instance of the annotated type, even if it is a subclass. More specifically, only the fields from the annotated type will be included in the dumped object:
from pydantic import BaseModel
class User(BaseModel):
name: str
class UserLogin(User):
password: str
class OuterModel(BaseModel):
user: User
user = UserLogin(name='pydantic', password='hunter2')
m = OuterModel(user=user)
print(m)
#> user=UserLogin(name='pydantic', password='hunter2')
print(m.model_dump()) # note: the password field is not included
#> {'user': {'name': 'pydantic'}}
pickle.dumps(model)
Pydantic models support efficient pickling and unpickling.
import pickle
from pydantic import BaseModel
class FooBarModel(BaseModel):
a: str
b: int
m = FooBarModel(a='hello', b=123)
print(m)
#> a='hello' b=123
data = pickle.dumps(m)
print(data[:20])
#> b'\x80\x04\x95\x95\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main_'
m2 = pickle.loads(data)
print(m2)
#> a='hello' b=123
include 和 exclude 进阶
The model_dump
and model_dump_json
methods support include
and exclude
arguments which can either be sets or dictionaries. This allows nested selection of which fields to export:
from pydantic import BaseModel, SecretStr
class User(BaseModel):
id: int
username: str
password: SecretStr
class Transaction(BaseModel):
id: str
user: User
value: int
t = Transaction(
id='1234567890',
user=User(id=42, username='JohnDoe', password='hashedpassword'),
value=9876543210,
)
# using a set:
print(t.model_dump(exclude={'user', 'value'}))
#> {'id': '1234567890'}
# using a dict:
print(t.model_dump(exclude={'user': {'username', 'password'}, 'value': True}))
#> {'id': '1234567890', 'user': {'id': 42}}
print(t.model_dump(include={'id': True, 'user': {'id'}}))
#> {'id': '1234567890', 'user': {'id': 42}}
The True
indicates that we want to exclude or include an entire key, just as if we included it in a set. This can be done at any depth level.
model 或字段级别的 include 和 exclude
我们还可以直接将 exclude: bool
传入 Field
中
(Field(..., exclude=True)
) 的优先级比 exclude
/include
on model_dump
/ model_dump_json
更高
from pydantic import BaseModel, Field, SecretStr
class User(BaseModel):
id: int
username: str
password: SecretStr = Field(..., exclude=True)
class Transaction(BaseModel):
id: str
value: int = Field(exclude=True)
t = Transaction(
id='1234567890',
value=9876543210,
)
print(t.model_dump())
#> {'id': '1234567890'}
print(t.model_dump(include={'id': True, 'value': True})) # 优先级低,没用
#> {'id': '1234567890'}
但是捏, setting exclude
on the field constructor (Field(..., exclude=True)
) 的优先级旧没有 exclude_unset
, exclude_none
, and exclude_default
parameters on model_dump
and model_dump_json
来的高了
from pydantic import BaseModel, Field
class Person(BaseModel):
name: str
age: int | None = Field(None, exclude=False)
person = Person(name='Jeremy')
print(person.model_dump())
#> {'name': 'Jeremy', 'age': None}
print(person.model_dump(exclude_none=True))
#> {'name': 'Jeremy'}
print(person.model_dump(exclude_unset=True))
#> {'name': 'Jeremy'}
print(person.model_dump(exclude_defaults=True))
#> {'name': 'Jeremy'}
在序列化时传递上下文
You can pass a context object to the serialization methods which can be accessed from the info
argument to decorated serializer functions. 如果你想在运行时期间动态更新序列化行为的话,这会很有用。For example, if you wanted a field to be dumped depending on a dynamically controllable set of allowed values, this could be done by passing the allowed values by context:
from pydantic import BaseModel, SerializationInfo, field_serializer
class Model(BaseModel):
text: str
@field_serializer('text')
def remove_stopwords(self, v: str, info: SerializationInfo):
context = info.context
if context:
stopwords = context.get('stopwords', set())
v = ' '.join(w for w in v.split() if w.lower() not in stopwords)
return v
model = Model.model_construct(**{'text': 'This is an example document'})
print(model.model_dump()) # no context
#> {'text': 'This is an example document'}
print(model.model_dump(context={'stopwords': ['this', 'is', 'an']}))
#> {'text': 'example document'}
print(model.model_dump(context={'stopwords': ['document']}))
#> {'text': 'This is an example'}
model_copy(...)
model_copy()
allows models to be duplicated (with optional updates), which is particularly useful when working with frozen models.
from pydantic import BaseModel
class BarModel(BaseModel):
whatever: int
class FooBarModel(BaseModel):
banana: float
foo: str
bar: BarModel
m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})
print(m.model_copy(update={'banana': 0}))
#> banana=0 foo='hello' bar=BarModel(whatever=123)
print(id(m.bar) == id(m.model_copy().bar))
#> True
# normal copy gives the same object reference for bar
print(id(m.bar) == id(m.model_copy(deep=True).bar))
#> False
# deep copy gives a new object reference for `bar`